Search Result

Select

Comparative density peaks clustering algorithm with automatic determination of clustering center

GUO Jia, HAN Litao, SUN Xianlong, ZHOU Lijuan

Journal of Computer Applications 2021, 41 (3): 738-744. DOI: 10.11772/j.issn.1001-9081.2020071071

Abstract （519）

PDF （2809KB）（546）

Save

In order to solve the problem that the clustering centers cannot be determined automatically by Density Peaks Clustering (DPC) algorithm, and the clustering center points and the non-clustering center points are not obvious enough in the decision graph, Comparative density Peaks Clustering algorithm with Automatic determination of clustering center (ACPC) was designed. Firstly, the distance parameter was replaced by the distance comparison quantity, so that the potential clustering centers were more obvious in the decision graph. Then, the 2D interval estimation method was used to perform the automatic selection of clustering centers, so as to realize the automation of clustering process. Experimental results show that the ACPC algorithm has better clustering effect on four synthetic datasets; and the comparison of the Accuracy indicator on real datasets shows that on the dataset Iris, the clustering accuracy of ACPC can reach 94%, which is 27.3% higher than that of the traditional DPC algorithm, and the problem of selecting clustering centers interactively is solved by ACPC.

Reference | Related Articles | Metrics

Select

Credit assessment method based on majority weight minority oversampling technique and random forest

TIAN Chen, ZHOU Lijuan

Journal of Computer Applications 2019, 39 (6): 1707-1712. DOI: 10.11772/j.issn.1001-9081.2018102180

Abstract （480）

PDF （895KB）（314）

Save

In order to solve the problem of unbalanced dataset in credit assessment and the limited classification effect of single classifier on unbalanced data, a Majority Weighted Minority Oversampling TEchnique-Random Forest (MWMOTE-RF) credit assessment method was proposed. Firstly, MWMOTE technology was applied to increase the samples of minority classes in the preprocessing stage. Then, on the preprocessed balanced dataset, random forest algorithm, one of supervised machine learning algorithms, was used to classify and predict the data. With Area Under the Carve (AUC) used to evaluate the performance of classifier, experiments were conducted on German credict card dataset from UCI database and a company's car default loan dataset. The results show that the AUC value of MWMOTE-RF method increases by 18% and 20% respectively compared with random forest method and Naive Bayes method on the same data set. At the same time, random forest method was combined with Synthetic Minority Over-sampling TEchnique (SMOTE) and ADAptive SYNthetic over-sampling (ADASYN), respectively, and the AUC value of MWMOTE-RF method increases by 1.47% and 2.34% respectively compared with them. The results prove the effectiveness and the optimization of classifier performance of the proposed method.

Reference | Related Articles | Metrics

Select

Multi-label classification algorithm based on joint probability

HE Peng, ZHOU Lijuan

Journal of Computer Applications 2015, 35 (3): 659-662. DOI: 10.11772/j.issn.1001-9081.2015.03.659

Abstract （594）

PDF （673KB）（548）

Save

Since the Multi-Label k Nearest Neighbor (ML-kNN) algorithm ignores the correlation between labels, a multi-label classification algorithm based on joint probability was proposed. Firstly, priori probability was calculated during traversing the sample space; Secondly, conditional probability of a label appeared m times in kNN when it got value 1 or 0 was computed; Then, the method of using label joint probability distribution, which was computed during traversing the sample space, as multi-label classification model was proposed. Finally, the multi-label classification model of coRrelation Multi-Label-kNN (RML-kNN) was deduced by way of maximizing the posterior probability. The theoretical analysis and comparison experiments on several datasets show that RML-kNN elevates Subset Accuracy to 0.9612 in the best case, which gains 2.25% promotion compared with ML-kNN; RML-kNN, which gains significant reduction on Hamming Loss, gets a minimum value of 0.0022; Micro-FMeasure can be elevated up to 0.9767, in comparison of ML-kNN, RML-kNN gets 2.88% elevation in the best case. The experimental results show that RML-kNN outperforms ML-kNN as it integrates correlation between labels during classification process.

Reference | Related Articles | Metrics